So welcome, everybody, to my presentation. I'm Tilman. I work for a company called CrowdStrike,
which is an American startup that deals with targeted attacks. But today I'm going to talk
about something else. I'm going to talk about one of my favorite topics, one of my hobbies,
which is peer-to-peer botnets. And peer-to-peer botnets are interesting because they're designed
to be resilient against attacks, right? And I'm usually trying to attack botnets and have
fun with them. So let's see. So, yeah, there's an agenda.
Okay. Let's start with a quick introduction to peer-to-peer botnets. I guess most people
in the room here are familiar with peer-to-peer networks in general. I mean, there are networks
like, you know, BitTorrent, eDunkey, like file-sharing networks, and others. And usually
the problem is, you know, there's a lot of people in the room who are familiar with
peer-to-peer networks. And the purpose is to build a decentralized infrastructure that's
self-reorganizing. So if parts of the infrastructure go offline, you know, it recovers itself and
so on. And usually people build peer-to-peer networks in general because they want to get
rid of any central components so the infrastructure cannot be taken down so easily.
When you analyze a peer-to-peer network of some sort, you want to understand the protocol
first.
That's not too much of a problem for all the popular file-sharing networks because they're
well documented. But if you look at peer-to-peer botnets, well, they usually use their proprietary
protocols that you have to reverse engineer first and understand first. So you have to
look at the samples, do the reverse engineering and so on. But if you do that for several
peer-to-peer networks, you will at some point see that there are different approaches. One
is based on gossiping. So if you think about that, you have all these different nodes that
are interconnected somehow and you want to propagate some information in this peer-to-peer
network. You can either do that by what we call gossiping. So each peer kind of gossips
information to its neighbors. So basically forwards information to all its neighbors
and these do the same and so on. But you can ‑‑ if you think about that, that's
probably not very effective, right? Because probably several peers will receive information
several times. So you fill up the network with more information than you actually want
to or have to.
So more advanced peer-to-peer networks use what people call an overlay network. So you
have addressing on top of, you know, the general addressing methods like IP. So every peer
has an ID or some sort of address and then there is a routing method so you can address
specific peers. And if you want to send information to a specific peer, then, well, you can ‑‑
if you know its address, you can route that through the peer-to-peer network. An example
for that is edunkey. You have a distributed network. You have a distributed network. You
have a distributed hash table on top of, you know, the IP network. Every peer has a hash,
which is at the same time its ID, its address, and then you can look up data in the hash
table and so on. But I'm not going into detail about that.
One important thing when we talk about peer-to-peer networks is bootstrapping. Bootstrapping is
the process of establishing connectivity with the peer-to-peer network when a new peer comes
online. And that's a very important aspect. That's a very important thing. Because when,
when you think about that, you want to get rid of any central entities in your peer-to-peer
network. Right? So might not be a good idea to have a seed server that all peers contact
to request an initial peer list. Right? That would be a central component and you don't
want to have that. So what people are doing is they deliver a seed list, a seed list of
other peers together with the node itself. Right? So, for example, with the executable,
that's then executed on the node system.
what happens if these peers go offline for some reason or if they are not online, the
computers have been switched off or something, then you need a fallback method. And that's
where it's getting interesting. And if you look at the box at the right-hand side, the
third entry is Configure which is a very famous or infamous piece of malware that was active
in 2009 and the following years and still is very active. And Configure used random scanning.
It scanned the Internet for other peers randomly. And, of course, there's no way to block that.
There is no information that the bot relies on when it's first started. It just starts
scanning the Internet until it finds other peers and then it can learn other peers from
that one and so on. Do that recursively to establish connectivity with the network.
Speaking of that box, that's my own private history of peer-to-peer botnets I analyzed.
So I started in 2008. I started in 2008. I started in 2009. I started in 2008. I started
in 2008 with the storm worm which used the eDunkey Network or Kedemlia Network or protocol
together with some other people. Some of them are here in this room. There are earlier
peer-to-peer botnets that are known. I think was active in 2007. And maybe there were
some others. But I think from 2007 is the earliest I know personally. Then there was
was Walladeck, which people believe is a successor of the storm worm. Because storm was ‑‑
it caught a lot of attention by researchers and lots of security people tried to investigate
storm and try to understand the protocol. Some even designed attacks, how you can attack
the P2P network to knock it off line or take it off line, take the nodes off line. So apparently
the people behind it decided to abandon it at some point and turn ‑‑ or create a
new botnet and that was called Walladeck and that was not relying on any existing P2P
infrastructure. So no eDunkey anymore. Instead they implemented their own proprietary protocol
which was very similar to ‑‑ maybe I shouldn't say very similar to eDunkey. But
you know, the overall concept behind the botnet had similar structures and design characteristics.
So that's why people said it's probably a successor of storm.
Then I already mentioned Configure. Configure was interesting because it started out as
a bot that was entirely centralized with its command and control infrastructure. Many
of you probably have heard about the DGA, the domain name generation algorithm that
it included. So it generated pseudo random domain names all the time and then tried to
contact ‑‑ resolve these and contact that host and ask for basically updates. Later
on these people switched to ‑‑ so in version C, the third version, they changed the name
and they switched to a P2P protocol as a fallback command and control channel because there
was some effort to block access to the generated domain so they needed something else. Otherwise
they would lose their 8 million nodes botnet, right?
So that was Configure. And then in 2010, I believe, late 2010, the Kilios era started.
That's a bot that's also known as HLUX or ‑‑ yeah, I think HLUX is the other model. But
most well‑known name. And that, again, is believed to be a successor of Walladeck
and that is because Walladeck was taken down by some people and myself with a P2P poisoning
attack and I will talk about that a little bit more in a minute.
So that botnet was taken away from them. So, again, they created a new one and that was
called Kilios A. And that's actually interesting because if you look at the list, Kilios A
was attacked as well with success.
So they created Kilios B, a successor, and tried to fix some stuff. That was taken down
as well. And, again, they created Kilios C, a third version. We attacked that as well.
It wasn't too successful. It somewhat survived because we didn't manage to own all the peers.
And just recently they changed something in the protocol and added private public key
encryption to it. Doesn't make sense at all because, you know, you might want to
encrypt your traffic, but you can't do this with symmetric encryption. It doesn't make
sense to do private public key stuff because the peers have to, you know, generate their
own keys and exchange keys and so on. And, I mean, anybody can do that, right? You can
still infiltrate the botnet by just doing the same, so it doesn't make sense. Anyhow.
Okay. And then in 2011 there was the minor botnet and I will show you some protocol examples
for that. A really stupid piece of malware that was used by a lot of people. It was a
protocol that was written in .NET if I'm not mistaken and the protocol was HTTP based
so it was a plain text protocol and they made several mistakes so it was trivial to take
down. Okay. And the remaining two, zero access and peer-to-peer zoos are somewhat interesting
because they are still around and they are really successful. They are some of the biggest
and most prevalent botnets that are around these days and they are mostly used for dropping
other malware.
Zero access is actually split into, I think, seven or eight separate botnets. Yeah, I don't
know why. Maybe they have some affiliate program or something. They also distinguish between
64 and 32-bit systems because they want to be able to, I don't know, inject DLLs into
other processes and that might make sense to maintain two separate infrastructures.
Okay. Well, going back to my slide here, obviously people build peer-to-peer botnets
because they want to ‑‑ they have the same goals as other people who build peer-to-peer
networks, right? They want to create a resilient infrastructure that is resilient against takeover
attempts or takedown attempts. So that's the goal. And that's why, you know, they're
getting somewhat popular. I'm sure there are other peer-to-peer botnets out there that
are not on my list. I'm aware of a few but I haven't looked into them so I'm not going
to talk about them. Interestingly for, I think, all, yeah, all
botnets that you've seen on the previous list, the architecture is not entirely, not
purely peer-to-peer. It's a hybrid architecture. It's what you see here. So the thing at the
bottom is the actual peer-to-peer network and the dashed lines represent a peer being
in the peer list of another peer. But when they want to receive the support of the peers,
receive commands for, I don't know, sending out spam or something like that, they still
reach out to central components. And the boxes you see in the middle ‑‑ can you see
that? Yeah. The boxes you see in the middle are proxy servers. So they usually have another
layer in between, like systems, like burner systems. So if some of the proxy servers get
taken down, they can easily replace them without losing their command and control infrastructure.
And then there is a command and control server on top that is the actual back end, okay?
There might actually be multiple layers between the P2P network and the C2, but, well, unless
you get access to one of the proxy servers, you don't see what's behind it. But we're
fairly certain that in most cases these are proxy servers because, you know, for example,
when they speak HTTP and they respond with an Nginx banner, well, you can be certain
that it's a proxy.
Most likely, at least. Okay. Let's take a look at some protocol examples so you get
an idea what these people create and come up with. This is the already mentioned minor
bot. And as I've said, that was really a trivial and also stupid protocol. It was HTTP
based and the bots, all the bots implemented their own tiny HTTP server. I mean, it wasn't
a full‑bound HTTP server, but, you know, just a very ‑‑ you know, a very, very
rudimentary one that was backed up by the file system. So if you would issue a get request
with the search parameter and the, you know, the IP list 2 value, that file name would
be looked up in the respective directory and then delivered to the requesting host. Okay?
So it was really ‑‑ I mean, if there were other files on the file system in that directory,
you could request them as well with this method. And that was probably not intended by them.
Yeah, anyhow, so you can see the response here. In that case, I think the engine X server
header is fake. They just copied that from somewhere and sent it with the responses.
And you can see at the bottom is the actual payload, a list of other peers, a list of
IP addresses. And minor always responds with the entire peer list that it has, right? All
peers that it knows about. And that's stupid because this can be huge. And also that makes
it easy for us to, you know, enumerate the bots and understand how many infected machines
there are and so on. If you want to attack it, for example. So, I mean, this is only
the start, right? You can see it's 11K in size and this is by far not the largest request
or response we've seen. You can try to recreate this peer‑to‑peer graph because it's basically
a graph, right? It's nodes who know about other nodes and talk to other nodes and so
on. So it's basically a graph. You can try to recreate that by crawling peers. And we
will talk more about crawling. I mean, that's the topic of the talk, right? We'll talk more
about that in a minute. But if you request a peer list from one peer, you can recreate
these links in the graph and then take the response, the IP address from the response
you got back and do the same for these and so on. And then, you know, plot pretty pictures
like this one here. I think that's about 37K nodes.
Which is only a subset of the minor botnet at that time. But it takes like ages to render
this picture here. So we only did that for a subset of the nodes we found.
You can see that other peer‑to‑peer protocols are somewhat similar. This is zero access
version one. There are two versions out there. This is the earlier version. And they define
‑‑ again, it's a proprietary protocol that they implemented ‑‑ they define a
I think six different message types. And one is a get L, which means get list, get peer
list from another peer. And the red L is the return peer list message.
This is what you get when you reverse engineer the message format and decode it. So ‑‑ I
mean, and parse it. It's not plain text. I think zero access version one had a four‑byte
key that it hashed with MD5 and then used that MD5 hash as an RC4 key to decrypt its messages.
But it was all in one. So I think that's what we're going to talk about in a minute.
It's always the same key. So it was basically symmetric encryption with a static key. And
the other version just used XOR with another key. Version two.
So if you undo the encryption, you end up with something like this. And you can see
here in the case of zero access version one, a peer list has 256 entries. So it always
returns up to 256 entries. But since the botnet is so ‑‑ you can see that the
botnet is large enough, every peer has always more than 256 entries at any time. So whenever
you ask a peer for its peer list, you will get these ‑‑ most likely these 256 entries.
And you can see there's some order there. So the first number is a time stamp or a time
delta, so to speak, because the botnet favors peers that have recently been active. And
that makes sense because you don't want to keep maybe if that is your strategy, you don't
want to keep, like ‑‑ you don't want to keep, like, you don't want to keep, like,
peers from the stone ages in your peer list that might be offline already or, you know,
they reboot from time to time, get a new IP address, so the entry becomes invalid. And
so you might want to favor peers that have recently become online or that you have recently
talked to. So that's why they sort these peer lists by the time delta and then return
the 256 most recent ones. They changed this protocol a little bit in
version two. So this is zero access version two. And you can see there are, again, these
two message types. I've already mentioned that the encryption is slightly different.
But for the most part, the protocol is very similar. So there is get L and the red L. Again,
you have the time stamps and you have the IP address. But they figured that they don't
need to send back 256 IP addresses. That's way too much, you know. It's sufficient if
you respond with only 16 IP addresses. That makes the messages smaller. So, you know,
less overall communication in the botnet. And the reason is, I mean, zero access version
two is really huge.
We've crawled some of the botnets and they count like, you know, 3.7 million. I think
that was the count we got. 3.7 million infected machines. And if you have 3.7 million machines
talking to each other, that's a lot of traffic. So you might want to reduce the message size.
So that's what they did. But if you take a look at the IP addresses, you might notice
that the last octet looks a little bit strange. It's always very high. And that is because
they do some deduplication.
You don't want two or multiple entries with the same IP address in your peer list, obviously.
Because if you allow that, it's trivial for other people to poison your peer list and
inject one entry multiple times and overwrite all the legitimate ones. And then you're
not connected to the peer botnet anymore, right? To the peer-to-peer botnet anymore.
So that's why they do deduplication. And in order to do that, they sort the IP addresses
and then, you know, go over the sorted list. And if they have two consecutive entries
that have the same IP address, they kick one out.
But because IP addresses are at least on PCs, sort in little endian, you know, and they
sort them, you have in the result these IP addresses with a high last octet in the response.
What's interesting is that they do that, but they don't filter out invalid IP addresses.
So when you crawl the botnet, you come across IP addresses like 255, 255, 255, 255, so
all bits set, which obviously is an invalid IP address. But it regularly shows up in these
lists because it's, you know, when you sort the list, decreasing order, then it's the
topmost entry and it's always included. And they have some other garbage in there. So
for some reason they don't filter out these entries, which is interesting.
Okay. Let's talk about crawling. So I mean, crawling is nothing else but recursively
enumerating peers. You start with one peer. You request its peer
list. You take a look at the response and do the same for all the returned addresses,
right, and so on until you, you know, want to go offline or I don't know. So that's all
that crawling is. But you really want to think about a crawling strategy. And one important
thing is crawling speed. So ideally we would be able to take a snapshot of the current
peer-to-peer graph and then, you know, enumerate the peers of that in that snapshot. And then
snapshot, but that's not possible. First off, because, you know, you have to do that actively,
you have to send out requests and process the responses, and that takes time. And while
you're doing that, the structure of the graph might be changing, right? Peers might go offline,
new peers might come online. So you will never be able to get that snapshot, right? But to
come closest to that, you want to be as quickly as possible.
Yeah, and when you do that, you have to think about things like unresponsive peers. What
if you ‑‑ if somebody sends you an IP address back that's offline, how do you deal
with that? Do you want to keep it in the list and try again later? I mean, you don't know
why it's unresponsive, right? You might lose packets. The network might be overwhelmed
with your traffic because you try to be as fast as possible. You don't know why it's
unresponsive. Or, yeah, there is some hiccup on the Internet. So you might want to keep
it in the list and try again later. But, you know, you can see it's getting a little
bit more complex. And what you see in the top right corner is
the results.
This is the result of us crawling peer‑to‑peer zoos, which is also known as Game Over, by
the way. And the red line, the red graph shows you the number of IP addresses that we learned.
So we call them known peers. But most of them are not actually reachable. Although
the protocol is pretty robust, so they don't include any invalid IP addresses in it. But
most of them are not actually reachable. So if you count only the peers that you can talk
to, you end up with a green line.
And you can see it's way less. And you see ‑‑ if you see these little
dips in the red line, that is because for zoos peer‑to‑peer zoos, we chose a strategy
where we cleaned up the list of known peers from time to time. So we said, okay, these
are unresponsive for too long now. Let's kick them out to keep the list small. Because
otherwise, you know, you have an endlessly growing list.
But what you can also see is that the green line converges very quickly. And that means
you have probably reached the number you are able to crawl. And that gives you some size
estimation. Okay. Okay. There is some fancy animation
here. You might wonder why anybody wants to crawl peer‑to‑peer botnets at all. I
mean, it's interesting to play with that. It's interesting to understand the protocol
and reimplement it and so on. And then that's a big part of it. But then the test is not
play with the botnet and maybe snoop on what they are doing. But we usually have other
goals. I mean, reconnaissance is usually the foremost thing, right? But why do you want
to learn something about the peer botnet and the infected machines? I've already mentioned
size estimation. If you talk to the press, they really like high numbers. So if you tell
them, you know, zero access is 10 million infected machines large, they will love that.
But next time you have to tell them the botnet is 15 million infected machines large or something.
So yeah, size estimation is one thing. But you have to be aware that you can only crawl
a subset of the infected machines. Most of them obviously are, you know, behind NAT,
behind gateways. You can't directly talk to them. You can't reach them from the Internet,
right? But they are still part of the peer botnet. They are like leaf notes in this
graph. So it's not trivial. If you do what we did
for a period of time, it's not trivial. If you do what we did for a period of time,
you end up with this green line and you get a number of machines that you can talk to,
you have to extrapolate from that number to get to a more realistic size estimation.
Infection tracking is something that people are doing who want to remediate or, you know,
kill these botnets. They want to learn about infected machines and then can report the
IP addresses to, let's say, ISPs who then pass the information on to their customers
and hopefully they clean up the machines so the botnet dies.
But I've never really seen that being successful. Geographic distribution is something you can
also get from that. If you have all the IP addresses, you can do geolocation lookups
and then if you want to, plot them on a map like what we did here. And I want to mention
Mark Schlosser and some other guys who created the code we based this on. This is actually
a live thing. So we send in a live feed of the crawling results and that displays these
nice little red dots. Okay. But what we're usually after is we want
to attack P2P botnets. So, I mean, if you can, for example, if you know all the nodes,
you might want to try and send them commands yourself, right, if you also understood the
command and control protocol. There are sometimes interesting commands like uninstall commands.
If you can send an uninstall command to all the bots you've identified and they are the
ones you can talk to, so it's the backbone of the whole graph, so to speak, right, and
then you can kill the botnet entirely. Or if you can, I don't know, send requests for
more information about the infected machines, you can, for example, get information about
the operating system version or other stuff. So that's usually interesting as well.
But you can also probably manipulate the peer‑to‑peer infrastructure. So think about it. If you can
generate your own peer lists and then propagate these in the peer‑to‑peer network, you can
create edges, you can kill other edges by replacing them and so on. So you can basically,
you know, tamper with that infrastructure. And we will talk more about that in a little
bit. Ideally you ‑‑ I mean, you might be able to sync the whole thing by replacing
all legitimate entries in the peer list with your own ones and by that have all peers
talking to your own machines. Which means that nobody else has access over them anymore.
So if you think about crawling strategies, you might ask yourself, do I want to implement
a depth first search or a BFS? But it doesn't really matter, at least that's what we think,
because first off, it's not a tree, it's a graph. I mean, you can distinguish the two
strategies anyway, but it doesn't really matter because it's dynamic. So, you know, it's changing
all the time anyway, so it doesn't really matter which nodes you start with and which
nodes you continue with.
At some point, if you're quick enough, fast enough, you will hopefully be able to learn
the biggest part of the reachable machines. If you track the infected machines, you need
to be able to distinguish, have I seen that IP address or have I seen that peer before,
do I want to include it in my list or is it a new one? And if you rely on IP addresses
only, that's a bit of a problem because I've already mentioned there is a lot of data
there, there's a lot of IP churn, you know, IP addresses that change after 24 hours, and
if you happen to crawl a peer or contact a peer and then the IP address changes and you
contact it again, you count it twice, so you want to avoid that, otherwise you get
screwed numbers. Some peer-to-peer protocols are nice, they implement unique IDs, especially
the ones that implement overlay networks because you need them for routing, right? And if
you have that, well, then you can have more accurate numbers.
Wow, he just gave it to me. Who knew? So part of the DEF CON experience is the best
technical talks delivered by the top speakers. It's very hard to get accepted to give a
talk here. You all should consider what you're doing to maybe become a speaker at some point.
This gentleman, this is his first time. Let's give him a big round of applause.
So we have another tradition at DEF CON, typically first time speakers do a shout on stage.
So cheers.
And now we'll see if you can pick up the talk.
I can start off where I left off. I know that some of you have probably seen this many times.
I'm going to not make him do that entire speech next time.
Does my voice sound any different? Okay. Thank you. Okay.
I think I need one more to nullify the previous one. No.
There you go. Now you'll be better. Good job.
Okay. Let's finish this before the stuff kicks in.
Yeah, we already said that you're done with the crawling when this curve converges because
you don't learn about any new peers anymore. And if there are some changes then it's due
to churn. So what you see here is an analysis of the
convergence for the P2P about as we crawled. I hope you can read that. I realize it's rather
small. But on the left hand side you see curves similar to the one we had on the previous
slide like the actual number of machines that we identified. And you can see that we're
going to see the ‑‑ I mean, it depends on the size of the botnet, of course. The
upper curves are zero axis which I already mentioned are pretty large so you get way
more hits. And the ones at the bottom are ‑‑ let me see. So that's a botnet called Sality
that I haven't looked into myself but one of my friends has. And he has provided these
numbers. So you can see depending on the size of the botnet, the scale is different.
But the shape is more or less the same. Right? So you can see that all of them kind of converge
against a straight line and then you know you're more or less done.
You can also take a look at the population increase or ‑‑ yeah, increase in percent.
And that's what is played on the right hand side which basically correlates with the other
graphs. Right? Yeah, so ‑‑ by the way, I did mention that I'm going to read some
code after this presentation. So we figured that whenever we want to crawl a P2P botnet,
we end up writing the same code. So after some time we said, okay, let's build some
basic code that, you know, we can add the protocol implementation to, but, you know,
do it right once and then, you know, add the changing stuff to that. And I'm going
to release that as open source later on. So, yeah, so how do you distinguish peers?
And I already talked about that. You have IP addresses, unique IP addresses versus I
K.
IDs. In the case where you have IDs, in the case where you haven't, you can still derive
some, you know, conclusions from other cases where IDs are available. And what you see
here, I mean, I'm cheating a little bit here because these graphs are not generated by
crawling. This botnet, that's actually Kilioc, so the last version that was attacked earlier
this year. These numbers are not generated by crawling the botnet, but in this case
we did node injection so we propagated a special peerless entry in the peer to peer network
and then it became very prominent and then all the other peers reached out to that machine.
And by this you even get the ones that are not directly reachable because at some point
the entries propagate through NAT and through gateways and so on. So this gives you way
more accurate numbers. And that allows us to compare the IP address count with the ID
count. And what you see here is, so green is the total number of bots, so that's a total
number of bots. So green is the total number of bots, so that's a total number of bots.
And blue is the number of unique IP addresses. And you can see that this goes up even though
we have seen almost all unique IDs. So the slope or whatever it's called is much slower
for the green line. And that's actually very similar. So the ratio between the two after
say 24 hours or 48 hours is almost the same for all botnets we've taken a look at. And
I mean, we have a paper out on that.
We can take a look at all the numbers. But I'm not going to cover that here. So you can
see after 24 hours that's where the two lines cross. So even if you don't have unique IDs,
you can say I take a look at the IP addresses I can collect in 24 hours and that gives me
probably pretty accurate numbers. Yeah, I already mentioned speed. Speed is
important. You want to be as quickly, as fast as possible. But being fast is not easy.
I mean, if the protocol is UDP based, it's a little bit easier because you don't have
to worry about session establishment and so on and timeouts. Actually, I didn't get to
finish the UDP code. Most of these botnets use UDP for a reason. I mean, the overhead
is less. But I didn't get to finish the crawler template code for UDP. So that's left as
an exercise for you or you wait until I'm done with it and check it into the repo. But
UDP is way simpler.
Usually people have either two threads, one that sends out messages and one that consumes
incoming messages. But if you do that, and actually many bots work that way, actually
most of the UDP ones we have seen. If you do that, you have to worry about synchronization.
So you have to, you know, have a peer list that you lock when you want to send out stuff
or, you know, select a peer that you want to send data to or when you receive data you
also probably want to lock the peer list. So you have to synchronize the two. So we usually
go in the main loop and just a single thread because it's faster. Code is a little bit
more complex. But, yeah. When you're talking TCP, it's, yeah, a little
bit more difficult. You have to establish TCP connections and you have to worry about
timeouts because you don't want to get dust, right? If you don't worry about all these
things and you crawl the network, they might, like, open ‑‑ create half open connections
and not respond to you at all or keep connections open forever that are established. And then,
you're running out of file descriptors and your crawling doesn't work anymore. So you
probably want to have, like, a limited set of file descriptors or sessions that you're
able to handle. So what we do, what the code does that I'm going to share publicly is
it allocates a fixed number of slots for sessions and, I mean, that's the amount of simultaneous
sessions the code can handle. And, you know, when it wants to contact a new peer, it takes
the next free slot from that array. So by that, you make sure that you have the
space, you make sure that your crawler doesn't get dust. Yeah, I talked about timeouts already.
Another thing is, if you talk to a peer, then you can ‑‑ I mean, definitely say
that it's live, that it exists, right? Thank you. That it exists. And the question is
how long do you want to keep it in your peer list flagged as active because, as I've said
previously, you want to distinguish between, like, a few minutes, or maybe a couple of
IP addresses or peers that you have encountered, that you have observed, and the ones that
you can actually talk to that are live, right? But, yeah, if you talk to a peer in his life,
for how long do you want to consider it live? So that's another thing. I mean, do you want
to consider it live for 24 hours or only three minutes or do you want to periodically recontact
it and if it doesn't respond anymore then you say it's not live anymore. So these are
parameters that are really, really important. I mean, it might not sound like that, but
they are really important and you might want to tune them for the specific botnet that
you are crawling to get accurate numbers. Yeah, also, I mean, packet loss, especially
when you're talking UDP, I mean, you can send out lots of UDP packets per time and if you
fill up your own line, your own pipe with UDP packets, you will have packet loss some
time and then you get funny results and, yeah, either get a bigger line, bigger bandwidth
or you will have packet loss. So, yeah, that's one of the things that you need to consider.
Or slow down a little bit. So you want to have a parameter that allows you to slow down
the whole crawling process. So Prowler is the name of the tool that we're
going to release today. As I said, it just implements the crawling framework, so to speak,
and you have to add the protocol implementation yourself. It provides you with some stop functions
that get called and that's where you have to implement the protocol. So if you want
to check it out, please do. As I've said, it's only TCP for now. Yeah. And you can see
what it looks like at the bottom of the slide. You can even see that it distinguishes between
known peers and active peers. And you can see ‑‑ can you see ‑‑ if you take
a look at the last two lines, you can see that the number of active peers goes down
from 719 to 717, and that is because, you know, after some time, some peers don't respond
anymore, so they're not considered active anymore and get flagged as inactive.
And in that case, we were crawling Kilios C, and that was in February. So my ‑‑ the
peerless I started off with only contained two entries. You see that on the right‑hand
side. And Kilios always shares, if you request another peerless, it always shares 250 entries.
And that is why, if you take a look at the ‑‑ at the first line, why, you know, if you're
using Kilios, it goes up to 250 known peers. It contacts one peer, it learns 250 entries,
so it knows 250 other ones immediately. And then it continues from there.
But if you take a look at the two graphs, again, the green line is active peers that
it can talk to, and the red line is peers that I have seen in peerless. You can see
that the green line gets constant very quickly. So it converges really quickly and, you know,
it's somewhere in the range of, I don't know, what is that, 700? Yeah, that's in line with
the numbers below. And that is because Kilios also favors more recent peers. Thank you. More
recent peers. So they have this backbone of what they call router nodes, and there's
never more than in the range of 700. So that's why we'll never be able to talk to more than
seven ‑‑ around 700 peers at a time. And you can also see these, I don't know,
steps or whatever you want to call them in the red curve.
And that is because if new peers come online, they propagate in the peer‑to‑peer network
and become active at some point, and then, you know, you get these steps. Because they
immediately, when a new peer comes online, they immediately get propagated to all peers
that are online, and that's what causes this effect. Okay. So I'm almost done here. This
is the Git repository where you can check out the code. As I've said, I will hopefully
add a UDP version soon.
I've checked in like that version like one hour before the talk, so there might be some
bugs in there. But I ‑‑ if you tell me that there's something buggy, I will fix
it, or you can fix it yourself and send me a patch. But I also want to talk about the
alternative that we already touched on briefly, which is node injection, as I've said, by
crawling, you will never be able to reach the peers that are behind gateways and so
on.
on. So you can actively participate in the peer-to-peer network as an alternative and
propagate your own IP addresses. And then at some point, depending on the popularity
of your node, the other peers will reach out to you and, you know, say, take me down or
send me commands. Yeah. And that's actually a comparison here between tracking based on
sensor injection and crawling. So you can see the top two lines are, again, this is
peer-to-peer Zeus. So we have IDs, unique IDs and IP addresses. So we distinguish between
the numbers for unique IDs and IP addresses. Of course, the number of IP addresses is much
higher. And the top two lines are what we achieved through sensor injection. And the
other lines are what we achieved through crawling. And the bottom lines are the active IP addresses
or the active IP addresses.
So you see it's much less than the peers that show up in the peer list.
Okay. That's basically my presentation. I want to give shouts to some people here because
they're awesome. And did some of the work with me here and deserve credit for it. And
that's it. I think we have a few more minutes left, maybe three or so. So if you have any
questions, you can ask me now or hunt me down at the bar later on.
Thank you.
Thank you.
